Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Intrinsic curiosity method based on reward prediction error
Qing TAN, Hui LI, Haolin WU, Zhuang WANG, Shuchao DENG
Journal of Computer Applications    2022, 42 (6): 1822-1828.   DOI: 10.11772/j.issn.1001-9081.2021040552
Abstract373)   HTML9)    PDF (2455KB)(181)       Save

Concerning the problem that when the state prediction error is directly used as the intrinsic curiosity reward, the reinforcement learning agent cannot effectively explore the environment in the task with low correlation between state novelty and reward, an Intrinsic Curiosity Module with Reward Prediction Error (RPE-ICM) was proposed. In RPE-ICM, the Reward Prediction Error Network (RPE-Network) model was used to learn and correct the state prediction error reward, and the output of the Reward Prediction Error (RPE) model was used as an intrinsic reward signal to balance over-exploration and under-exploration, so that the agent was able to explore the environment more effectively and use the reward to learn skills to achieve better learning effect. In different MuJoCo (Multi-Joint dynamics with Contact) environments, comparative experiments were conducted on RPE-ICM, Intrinsic Curiosity Module (ICM), Random Network Distillation (RND) and traditional Deep Deterministic Strategy Gradient (DDPG) algorithm. The results show that compared with traditional DDPG, ICM-DDPG and RND-DDPG, the DDPG algorithm based on RPE-ICM has the average performance improved by 13.85%, 13.34% and 20.80% respectively in Hopper environment.

Table and Figures | Reference | Related Articles | Metrics